An Exploration into the Intersections of Salaries and Social Identity
In this project, I will use the Salary by Job Title and Country
dataset available in Kaggle.
I have cleaned the data to group the same degrees and similar job
titles. I have also excluded some observations with unique job titles in
order to focus on the most popular and common jobs.
Research questions:
Motivations: In 2021: about two-thirds of employees in the STEM workforce were men and about one-third were women (NCSES). Being a woman going into the STEM field, I thought it would be interesting to see how the salaries compared. When I found this dataset with many other variables, I decided to look at other social identities as well.
Definition of Social Identity: Social identity is the part of self-concept that is derived from memberships in social groups or categories (APA). Social identities explored in this project:
There are 6647 observations and 8 variables in this dataset.
Rows: 6,647
Columns: 8
$ Age <dbl> 32, 28, 36, 29, 42, 31, 26, 38, 29, 48, 35, 40, …
$ Gender <chr> "Male", "Female", "Female", "Male", "Female", "M…
$ `Education Level` <chr> "Bachelor's Degree", "Master's Degree", "Bachelo…
$ `Job Title` <chr> "Software Engineer", "Data Analyst", "Sales Repr…
$ `Years of Experience` <dbl> 5, 3, 7, 2, 12, 4, 1, 10, 3, 18, 6, 14, 2, 16, 7…
$ Salary <dbl> 90000, 65000, 60000, 55000, 120000, 80000, 45000…
$ Country <chr> "UK", "USA", "USA", "USA", "USA", "China", "Chin…
$ Race <chr> "White", "Hispanic", "Hispanic", "Hispanic", "As…
It makes sense that as age increases, so does years of experience. Which we can see from the scatter plot. There might be some instances of someone starting a career younger than most others or someone starting over in a new career later in life that don’ quite fit the pattern. However, there is a pretty significant positive linear pattern among this dataset.
That corresponds to salary increasing as age increases. The salary will depend on the job because younger people could start in a higher paying job than an older person has. Overall, there is a positive linear pattern among salary and age.
Salary Histogram: The distribution of the salaries in this dataset is multimodal. There are many peaks and dips throughout the histogram. The summary statistics for the salary variable are:
Min. 1st Qu. Median Mean 3rd Qu. Max.
25000 70000 115000 115454 160000 250000
Salary per Country: The countries all have similar
salaries.
The median salaries for each country are:
All of the countries have the same minimum salary of $25,000. However, they have different maximums. Canada has the highest and Australia has the lowest.
These were the five most common jobs in this dataset. I created a
separate data frame to examine these five jobs on their own. A data
analyst and software engineer tend to make the most, followed by a
marketing employee, someone who works in human resources, and finally a
developer.
The median salaries for these jobs are:
Looking at the education levels among these five jobs we can see that there is a significant amount of data analysts with PhDs. There are no PhDs among developers and most of them have Bachelor’s degrees. Human resources employees have a lot of Master’s degrees. Marketing employees and software engineers have more Bachelor’s degrees, but still have a good amount of Master’s and PhDs.
I chose five jobs at random that I thought were well known, popular,
and would show interesting results. I then created a separate data frame
to examine these five jobs on their own. A researcher tends to make the
most, followed by a financial advisor which also has the most variance
among the jobs. IT support is next, then an accountant, and finally a
sales representative.
The median salaries for these jobs are:
Looking at the education levels among these five jobs we can see that all of the accountants have Bachelor’s degrees. Most of the financial adivsors have a Bachelor’s with some Master’s and one PhD. IT support contains mostly Bachelor’s with some Master’s. All of the researchers have PhDs except for one. Surprisingly, a majority of the sales associates have a Master’s degree.
Conclusions: When looking at salary distributions for the social identities in this dataset, I found that salary increases with age which corresponds to years of experience. The salaries were very similar among the countries represented by this dataset. There were also similar distributions among races with the hispanic race having the lowest median and mixed race having the highest. Males have a significantly higher salary than women showed by the median and maximum from the two genders. There is also a difference in salaries between the education levels which makes sense because more education leads to higher salaries. From the 10 specific careers looked at in this project, we can see that researchers, software engineers, and data analysts make the most. Developers, accountants, and sales representatives make the least.
Limitations: Some limitations of this project was that the dataset contained mostly higher paying jobs. This is not an accurate description of overall salaries among certain groups. For example, the median salary in the U.S. is about $40,000-$50,000 so about half of what this dataset shows.
Potential Future Directions: Future directions could include exploring more aspects of social identities like religion or sexual orientation. It would be good to look at more countries, maybe some that aren’t first-world countries.
Audience: This project would be good for college graduated or other young people trying to decide what career field to go into. They can look at these statistics and compare them to their own social identities to see what a potential salary could look like for them. It could also be used by people in the workforce to compare their salary to others’.
About the Author: My name is Lindsey Winslow. I am an undergraduate student at the University of Dayton. I am graduating in May 2024 with a Bachelor of Science in Education with a major in Education & Allied Studies, along with a Bachelor of Arts with a major in Mathematics. I am interested in pursuing full time employment in the corporate data analytics field or in something education related like curriculum development.
You can connect with me on LinkedIn
---
title: "Salaries & Social Identity"
output:
flexdashboard::flex_dashboard:
theme:
version: 4
bootswatch: default
navbar-bg: "green"
orientation: columns
vertical_layout: fill
source_code: embed
---
```{r setup, include=FALSE}
library(flexdashboard)
library(tidyverse)
library(DT)
library(plotly)
salary<- read_csv("~/Desktop/MTH 209/SalaryFinal.csv")
```
Overview
===
**An Exploration into the Intersections of Salaries and Social Identity**
In this project, I will use the Salary by Job Title and Country dataset available in [Kaggle](https://www.kaggle.com/datasets/amirmahdiabbootalebi/salary-by-job-title-and-country/data).
I have cleaned the data to group the same degrees and similar job titles. I have also excluded some observations with unique job titles in order to focus on the most popular and common jobs.
**Research questions:**
- Does gender and/or race have any impact on the salary someone receives?
- Do any countries have higher salaries than others? Why might that be?
- Which jobs have higher or lower salaries?
**Motivations:** In 2021: about two-thirds of employees in the STEM workforce were men and about one-third were women [(NCSES).](https://ncses.nsf.gov/pubs/nsf23315/report/the-stem-workforce#:~:text=The%20share%20of%20women%20and,(figure%202%2D3)) Being a woman going into the STEM field, I thought it would be interesting to see how the salaries compared. When I found this dataset with many other variables, I decided to look at other social identities as well.
**Definition of Social Identity:** Social identity is the part of self-concept that is derived from memberships in social groups or categories [(APA).](https://dictionary.apa.org/social-identity)
Social identities explored in this project:
- Age
- Nationality
- Race
- Gender
- Education Level
- Occupation
Data
===
Column {data-width=450}
---
### <b><font size=4><span Style = "color:blue">First 500 Observations</span></font></b>
```{r show_table}
datatable(salary[1:500,], rownames=FALSE, colnames= c("Age", "Gender", "Education Level", "Job Title", "Years of Experience", "Salary", "Country", "Race"), options=list(pageLength=20))
```
Column {data-width=550}
---
### <font size= 4><span Style = "color:blue">Variables</span></font>
There are 6647 observations and 8 variables in this dataset.
- Age: the age of the employee
- Gender: the gender of the employee
- Education Level: the highest degree the employee has earned
- Job Title: the title of the job/job category the employee possesses
- Years of Experience: the number of years the employee has worked in that field
- Salary: the yearly salary of the employee (US Dollars)
- Country: the country in which they are employed
- Race: the race of the employee
A glimpse of the data:
```{r}
glimpse(salary)
```
Age & Experience
===
Column {.tabset data-width=550}
---
### Age vs Years of Experience
```{r}
salary<- salary %>% rename(
Job_Title= `Job Title`,
Education_Level=`Education Level`,
Years_of_Experience=`Years of Experience`)
ggplot(salary, aes(x=Age, y=Years_of_Experience))+
geom_point(color="#458B74")+
labs(title="Age vs Years of Experience", y= "Years of Experience")+
theme(text=element_text(size=20))
```
### Age vs Salary
```{r}
ggplot(salary, aes(x=Age, y=Salary))+
geom_point(color="#6E8B3D")+
labs(title="Age vs Salary")+
theme(text=element_text(size=20))
```
Column {data-width=450}
---
### Analysis
It makes sense that as age increases, so does years of experience. Which we can see from the scatter plot. There might be some instances of someone starting a career younger than most others or someone starting over in a new career later in life that don' quite fit the pattern. However, there is a pretty significant positive linear pattern among this dataset.
That corresponds to salary increasing as age increases. The salary will depend on the job because younger people could start in a higher paying job than an older person has. Overall, there is a positive linear pattern among salary and age.
Salary by Country
===
Column {.tabset data-width=550}
---
### Overall Salary Histogram
```{r}
ggplot(salary, aes(x=Salary))+
geom_histogram(fill="#00CD66")+
labs(title="Distribution of Salary")+
theme(text=element_text(size=20))
```
### Salary per Country
```{r}
ggplot(salary, aes(x=Country, y=Salary))+
geom_boxplot(fill="#C1FFC1")+
labs(title="Distribution of Salary by Country", y="Frequency")+
theme(text=element_text(size=20))
```
### Map
```{r}
map<- map_data("world")
salary_map<- salary %>%
group_by(Country) %>%
summarize(medsalary=median(Salary))%>%
left_join(map, by=c("Country"="region"))
ggplot(salary_map, aes(long, lat, group=group))+
geom_polygon(aes(fill=medsalary), color="white")+
scale_fill_viridis_c(option="C")+
labs(fill="Median Salary per Country")+
theme_void()+
theme(legend.position="bottom")+
theme(text=element_text(size=8))
```
Column {data-width=450}
---
### Analysis
**Salary Histogram:** The distribution of the salaries in this dataset is multimodal. There are many peaks and dips throughout the histogram. The summary statistics for the salary variable are:
Min. 1st Qu. Median Mean 3rd Qu. Max.
25000 70000 115000 115454 160000 250000
**Salary per Country:** The countries all have similar salaries.
The median salaries for each country are:
- United States: $110,000
- Australia: $115,000
- UK: $115,000
- China: $117,460
- Canada: $120,000
All of the countries have the same minimum salary of $25,000. However, they have different maximums. Canada has the highest and Australia has the lowest.
Social Identities
===
Column {.tabset data-width=550}
---
### People per Race
```{r}
ggplot(salary, aes(x=Race))+
geom_bar(fill="#87CEFA")+
labs(title="Number of People per Race", y="Count")+
theme(text=element_text(size=20))
```
### Salary by Race
```{r}
ggplot(salary, aes(x=Race, y=Salary))+
geom_boxplot(fill="#E0FFFF")+
labs(title="Distribution of Salary by Race")+
theme(text=element_text(size=20))
```
### Salary by Gender
```{r}
ggplot(salary, aes(x=Gender, y=Salary))+
geom_boxplot(fill="#BCEE68")+
labs(title="Distribution of Salary by Gender")+
theme(text=element_text(size=20))
```
### People per Education Level
```{r}
ggplot(salary, aes(x=Education_Level))+
geom_bar(fill="#8B668B")+
labs(title="Number of People per Education Level", y="Count", x="Education Level")+
theme(text=element_text(size=17))
```
### Salary by Education Level
```{r}
ggplot(salary, aes(x=Education_Level, y=Salary))+
geom_boxplot(fill="#FFBBFF")+
labs(title="Distribution of Salary by Education Level", x="Education Level")+
theme(text=element_text(size=16))
```
Column {data-width=450}
---
### Analysis
**Race:** A majority of the people in this dataset are white, followed closely behind by asian, then black, mixed, and hispanic. Mixed race has the highest median while hispanic has the lowest. All races have the same minimum but asain and black have the highest maximums.
The median salaries for each race are:
- Hispanic: $104,830
- Asian: $115,000
- White: $115,000
- Black: $118,000
- Mixed: $120,000
**Gender:** This boxplot shows that males have higher salaries than females.
The median salaries for each race are:
- Female: $105,000
- Male: $120,000
**Education Level:** Most people in this dataset have a Bachelor's Degree, then Master's, then PhD. Overall people with Bachelor's Degrees have lower salaries, Master's in the middle, and PhDs produce the highest salaries.
The median salaries for each education level are:
- Bachelor's Degree: $80,000
- Master's Degree: $120,000
- PhD: $170,000
Most Common Jobs
===
Column {.tabset data-width=600}
---
### Salaries for Common Jobs
```{r}
TopFive<- salary %>%
filter(Job_Title=="Software Engineer" | Job_Title=="Marketing" |
Job_Title=="Data Analyst" | Job_Title=="Developer" |
Job_Title=="Human Resources")
ggplot(TopFive, aes(x=Job_Title, y=Salary))+
geom_boxplot(fill="#00CD66")+
labs(title="Distribution of Salary by Job Title", subtitle = "Most Common Five", x="Job Title")+
theme(text=element_text(size=15),
axis.text.x=element_text(size=10))
```
### Education Level for Common Jobs
```{r}
ggplot(TopFive, aes(x=Job_Title, fill=Education_Level))+
geom_bar(position="fill")+
scale_y_continuous(breaks=seq(0,1,by=0.2), labels=scales::percent)+
labs(title="Distribution of Education by Job", subtitle = "Most Common Five", x="Job Title", y="Percent of People", fill="Education Level")+
theme(text=element_text(size=15),
axis.text.x=element_text(size=7))
```
Column {data-width=400}
---
### Analysis
These were the five most common jobs in this dataset. I created a separate data frame to examine these five jobs on their own. A data analyst and software engineer tend to make the most, followed by a marketing employee, someone who works in human resources, and finally a developer.
The median salaries for these jobs are:
- Software Engineer: $154,636
- Data Analyst: $150,000
- Marketing: $95,000
- Human Resources: $92,000
- Developer: $70,000
Looking at the education levels among these five jobs we can see that there is a significant amount of data analysts with PhDs. There are no PhDs among developers and most of them have Bachelor's degrees. Human resources employees have a lot of Master's degrees. Marketing employees and software engineers have more Bachelor's degrees, but still have a good amount of Master's and PhDs.
Random Jobs
===
Column {.tabset data-width=600}
---
### Salaries for Random Jobs
```{r}
RandomFive<- salary %>%
filter(Job_Title=="Sales Representative" | Job_Title=="Financial Advisor" |
Job_Title=="Researcher" | Job_Title=="Accountant" |
Job_Title=="IT Support")
ggplot(RandomFive, aes(x=Job_Title, y=Salary))+
geom_boxplot(fill="#4EEE94")+
labs(title="Distribution of Salary by Job Title", subtitle="Random Five", x="Job Title")+
theme(text=element_text(size=15),
axis.text.x=element_text(size=9))
```
### Education Level for Random Jobs
```{r}
ggplot(RandomFive, aes(x=Job_Title, fill=Education_Level))+
geom_bar(position="fill")+
scale_y_continuous(breaks=seq(0,1,by=0.2), labels=scales::percent)+
labs(title="Distribution of Education by Job", subtitle = "Random Five", x="Job Title", y="Percent of People", fill="Education Level")+
theme(text=element_text(size=15),
axis.text.x=element_text(size=6))
```
Column {data-width=400}
---
### Analysis
I chose five jobs at random that I thought were well known, popular, and would show interesting results. I then created a separate data frame to examine these five jobs on their own. A researcher tends to make the most, followed by a financial advisor which also has the most variance among the jobs. IT support is next, then an accountant, and finally a sales representative.
The median salaries for these jobs are:
- Researcher: $160,000
- Financial Advisor: $120,000
- IT Support: $110,000
- Accountant: $55,000
- Sales Representative: $30,000
Looking at the education levels among these five jobs we can see that all of the accountants have Bachelor's degrees. Most of the financial adivsors have a Bachelor's with some Master's and one PhD. IT support contains mostly Bachelor's with some Master's. All of the researchers have PhDs except for one. Surprisingly, a majority of the sales associates have a Master's degree.
Conclusion
===
**Conclusions:** When looking at salary distributions for the social identities in this dataset, I found that salary increases with age which corresponds to years of experience. The salaries were very similar among the countries represented by this dataset. There were also similar distributions among races with the hispanic race having the lowest median and mixed race having the highest. Males have a significantly higher salary than women showed by the median and maximum from the two genders. There is also a difference in salaries between the education levels which makes sense because more education leads to higher salaries. From the 10 specific careers looked at in this project, we can see that researchers, software engineers, and data analysts make the most. Developers, accountants, and sales representatives make the least.
**Limitations:** Some limitations of this project was that the dataset contained mostly higher paying jobs. This is not an accurate description of overall salaries among certain groups. For example, the median salary in the U.S. is about \$40,000-$50,000 so about half of what this dataset shows.
**Potential Future Directions:** Future directions could include exploring more aspects of social identities like religion or sexual orientation. It would be good to look at more countries, maybe some that aren't first-world countries.
**Audience:** This project would be good for college graduated or other young people trying to decide what career field to go into. They can look at these statistics and compare them to their own social identities to see what a potential salary could look like for them. It could also be used by people in the workforce to compare their salary to others'.
**About the Author:** My name is Lindsey Winslow. I am an undergraduate student at the University of Dayton. I am graduating in May 2024 with a Bachelor of Science in Education with a major in Education & Allied Studies, along with a Bachelor of Arts with a major in Mathematics. I am interested in pursuing full time employment in the corporate data analytics field or in something education related like curriculum development.
You can connect with me on [LinkedIn](https://www.linkedin.com/in/lindsey-winslow-79b537306/)
Social Identities
Column
People per Race
Salary by Race
Salary by Gender
People per Education Level
Salary by Education Level
Column
Analysis
Race: A majority of the people in this dataset are white, followed closely behind by asian, then black, mixed, and hispanic. Mixed race has the highest median while hispanic has the lowest. All races have the same minimum but asain and black have the highest maximums.
The median salaries for each race are:
Gender: This boxplot shows that males have higher salaries than females. The median salaries for each race are:
Education Level: Most people in this dataset have a Bachelor’s Degree, then Master’s, then PhD. Overall people with Bachelor’s Degrees have lower salaries, Master’s in the middle, and PhDs produce the highest salaries.
The median salaries for each education level are: